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Abstract 

Direction of arrival (DOA) estimation is a classical problem in signal processing with many practical appli- 
cations. Its research has recently been advanced owing to the development of methods based on sparse signal 
reconstruction. While these methods have shown advantages over conventional ones, there are still difficulties in 
practical situations where true DOAs are not on the discretized sampling grid. To deal with such an off-grid DOA 
estimation problem, this paper studies an off-grid model that takes into account effects of the off-grid DOAs and 
has a smaller modeling error An iterative algorithm is developed based on the off-grid model from a Bayesian 
perspective while joint sparsity among different snapshots is exploited by assuming a Laplace prior for signals 
at all snapshots. The new approach applies to both single snapshot and multi-snapshot cases. Numerical simu- 
lations show that the proposed algorithm has improved accuracy in terms of mean squared estimation error. The 
algorithm can maintain high estimation accuracy even under a very coarse sampling grid. 

1 Introduction 

Source localization using sensor arrays has been an active research area for decades. This paper focuses on the 
narrowband far-field source case where the wave front is assumed to be planar and the angle/direction information 
is to be estimated, known as the direction of arrival (DOA) estimation problem. MUSIC [2 | is the most successful 
method among conventional DOA estimation techniques. It has been proven to be a realization of the maximum 
likelihood method in the case of a large number of snapshots and uncorrected source signals |[3|. The research on 
DOA estimation has been advanced in recent years owing to the development of methods based on sparse signal 
reconstruction (SSR) or compressed sensing (CS) Q. In these methods, e.g., ^i-SVD |5|, a fixed sampling grid is 
selected firstly that serves as the set of all candidates of DOA estimates. Then by assuming that all true (unknown) 
DOAs are exactly on the selected grid, a SSR problem can be formulated where the DOAs of interest constitute the 
support of the sparse signal to be recovered. 

In the case of a single measurement vector (SMV, or a single snapshot), £i optimization |j6| is a favorable 
approach to the sparse signal recovery due to its guaranteed recovery accuracy. It has been proven in 1 6 1 that a 
sparse signal can be accurately recovered under a so-called restricted isometry property (RIP) condition that requires 
all columns of the measurement matrix be highly incoherent. In the case of multiple measurement vectors (MMV), 
the sparse signals at all snapshots share the same support. It has been shown in [7 ] that such joint sparsity can be 
exploited to improve the averaged recovery success probability under a similar incoherent matrix condition. Sparse 
Bayesian inference/learning (SBI) [8] is another popular method for the sparse signal recovery in CS. In SBI, the 
signal recovery problem is formulated from a Bayesian perspective while the sparsity information is exploited by 
assuming a sparse prior for the signal of interest. As an example, a Laplace signal prior leads to a maximum a 
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posteriori (MAP) optimal estimate that coincides with an optimal solution to the £i optimization 1'9','TOl. In the 
MMV case, the joint sparsity among different (uncorrelated) snapshots is utilized by assuming the same sparse prior 
for the signals at all snapshots 1 1 1 1. Correlations between snapshots have also been studied in a recent paper 1 12 1. 
One merit of SBI is its flexibility in modeling sparse signals that can not only promote the sparsity of its solution, 
e.g., in fTOl, but also exploit the possible structure of the signal to be recovered, e.g., in |T3]. Since the Bayesian 
inference is a probabilistic method and based on heuristics to some extent, one shortcoming of SBI is that it offers 
less guarantee on the signal recovery accuracy as compared with, for example, ii optimization. 

Recent advancements in array signal processing include compressive (CS-) MUSIC |14| and subspace-augmented 
(SA-) MUSIC [ 15|. They are combinations of the conventional MUSIC technique and recent CS methods with guar- 
anteed support recovery performance and can outperform MUSIC and standard CS approaches. Though existing 
CS-based approaches have shown their improvements in DOA estimation, e.g., their success in the case of limited 
snapshots, there are still difficulties in practical situations where the true DOAs are not on the sampling grid. On 
one hand, a dense sampling grid is necessary for accurate DOA estimation to reduce the gap between the true DOA 
and its nearest grid point since the estimated DOAs are constrained on the grid. On the other hand, a dense sampling 
grid leads to a highly coherent matrix that violates the condition for the sparse signal recovery. We refer to the 
model adopted in the standard CS methods as an on-grid model hereafter in the sense that the estimated DOAs are 
constrained on the fixed grid. 

The first off-grid model for DOA estimation was proposed in |rT6| where the estimated DOAs are no longer 
constrained in the sampling grid set. The model takes into account the basis mismatch in the measurement matrix 
caused by the off-grid DOAs. It has been shown that the sparse total least squares (STLS) solver proposed in 1 16| 
can yield an MAP optimal estimate if the matrix perturbation cased by the basis mismatch is Gaussian. It is shown, 
however, that the Gaussian condition cannot be satisfied in the off-grid DOA estimation problem and hence a new 
solver is needed. 

In this paper, we propose a Bayesian algorithm for DOA estimation based on the off-grid model that applies 
to both SMV and MMV cases. The off-grid distance (the distance from the true DOA to the nearest grid point) 
that lies in a bounded interval is assumed to be uniformly distributed rather than Gaussian as in [16|. We refer to 
our algorithm as off-grid sparse Bayesian inference (OGSBI) in the body of the paper. We then incorporate in our 
algorithm an idea in Q, using the singular value decomposition (SVD) to reduce the computational workload of the 
signal recovery process and the sensitivity to noise, namely OGSBI-SVD. Our approach is a spectral-like method 
with the value of the spectrum at each DOA being the estimated source power from such a direction. We show 
by numerical simulations that the proposed method has a smaller mean squared error (MSE) in comparison with 
£i-SVD [5 1. Indeed, the proposed method can exceed a lower bound of MSE that is shared among all on-grid model 
based methods including CS-MUSIC and SA-MUSIC. It is also shown that the proposed method is more accurate 
and faster than STLS. Moreover, the proposed method can estimate DOAs accurately even under a coarse sampling 
grid. 

Notations used in this paper are as follows. Bold-face letters are reserved for vectors and matrices, x, and 
are complex conjugate, transpose and conjugate transpose of x, respectively. ||-||^, ||-||2 and ||-||p denote the 
li norm, ^2 norm and Frobenious norm, respectively. Tr {A} are the determinant and trace of a matrix A 
respectively. Xj is the jth entry of a vector x. Aj, A^ and Aij are the jth column, jth row and {i,j)th entry of a 
matrix A, respectively, diag (A) is a column vector with its entries being the diagonal elements of a matrix A, and 
diag (a;) is a matrix with its diagonal elements being entries of a vector x. x'{9) is the derivative of x{9) with respect 
to 6. 'R and 9 stand for the real and imaginary parts of a complex variable respectively, a; y is the Hadamard 
(element-wise) product of x and y. x denotes an estimate of x. 

The rest of the paper is organized as follows. Section [2] studies the off-grid model used in this paper. Section [3] 
introduces the proposed OGSBI and OGSBI-SVD algorithms. Section [4]presents our simulation results. Conclusion 
and future work are given in Section [5] 
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2 Off-grid DOA Estimation Model 



Consider K narrowband far-field sources Sk{t),k = 1, - ■ ■ ,K, impinging on an array of M omnidirectional sensors 
from directions 0^, k = 1, • • • ,K. Time delays at different sensors can be represented by simple phase shifts, 
leading to the observation model |[T| : 



y{t) = A{e)s{t) + e{t), t = l, 



(1) 



where = [yi{t),--- ,yM{t)Y , 6 = [6'i,-- - = [si{t),--- ,SK{t)f, e{t) = [ei{t),--- ,eM{t)Y , 

and ym{t) and em(i), = 1, • • • , M, are the output and measurement noise of the mth sensor at time t respectively. 
The matrix A{0) = [a {6i) , • • • , a {Or)] is an array manifold matrix and a {9k) is called steering vector of the A;th 
source. The entry {Ok) contains the delay information of the fcth source to the mth sensor. In this paper, we 



assume that the number of sources K is already known. Readers are referred to 1 17 1 for the estimation of K which 
is beyond the scope of this paper. So, the goal is to find the unknown DOAs 9 given K, y{t) and the mapping 
6 — A{0). For simplicity, we consider only the 2D case in this paper, i.e., all sources and sensor arrays lie in the 
same plane. In the following we re-derive the off-grid model proposed in p6| using linear approximation and further 
show its relationship with the on-grid one. 

Let = l^'i, • • • , ^Ar| be a fixed sampling grid in the range of all possible DOAs, where N denotes the grid 

number and typically satisfies N ^ M > K. Suppose 9k ^ • " " ) | for some A; G {1, • • • , K} and that On^^ 
nfc G {1, • • • , N}, is the nearest grid point to Ok- We approximate the steering vector a {Ok) using linearization: 



a(9nj+h 



with b { 9. 



a 



'rik 



Denote A 



'rik 



9k -0. 



(2) 



,B 



bio 



,b(9N 



,/3 = [/3i,--- ,M 



diag (/3) and * = A + BA, where for n = 1, 



,N, 



f3n 



Ok -Ojik, if n = Uk for any G {1, 
0, otherwise, 



(3) 



with Hk G {!,••• ,N} and 9n^. being the nearest grid to a source 9k, k 
$ = $ {(3). The observation model in ([T]) can be rewritten intc[^ 

y{t) = ^x{t) + e{t), t = !,■■■, J 



G {1, • • • , K}. For brevity we write 



(4) 



which is the off-grid model to be used in this paper. 

It should be noted that the off-grid model in (|4]) is closely related to the on-grid one that can be obtained by setting 
/3 = in (|4]) becomes A in such a case). In fact, the matrix $ can be considered as the first order approximation 
of the true "overcomplete" array manifold matrix and A is the zeroth order approximation. As a result, the off-grid 
model has a much smaller modeling error than the on-grid one. Such an advantage is twofold. First, by adopting the 
same sampling grid the off-grid model results in higher accuracy, especially in the case of a low measurement noise 
where the modeling error is the dominant modeling uncertainty. Second, a coarser sampling grid can be adopted in 
the off-grid model to achieve a considerably reduced computational workload with a comparable modeling accuracy. 

To estimate the DOAs 6 we need to find not only the support of the sparse signals x{t),t = 1, • • • , T, but also 
the off-grid difference f3. In this paper, we formulate the problem based on a Bayesian perspective and develop an 
iterative algorithm to estimate x{t), t = I, ■ ■ ■ ,T, as well as /3, in the following section. 



'The linear approximation error introduced in lj2| can be sufficiently small and neglected when a dense enough grid 9 is selected. 
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3 OGSBI: Off-grid Sparse Bayesian Inference 



We assume that is a uniform grid with a grid interval r = 62 — Oi for simplicity of exposition. Then we have 
f3 S [—^r, ^r] ^ . The only difference is the domain of (3 in the case of a general grid. Without loss of generality, 
we consider complex-valued signals throughout the paper since the matrix $ is complex-valued. We derive our 
algorithm in the MMV case. The SMV is a special case by simply setting T = 1. Denote Y = [y{l), - ■ ■ , y{T)], 
X = [x{l), ■■■ , x{T)] and E = [e(l), • • • , e{T)]. The off-grid DOA estimation model in Q becomes 

Y = *X + E (5) 

with * = A + BA, Y,E £ C*^^^, X e C^^^, A,B,^ e C^^^^ and f3 G [-^r, ^r]^ . The matrix X of 
interest is jointly sparse, i.e., all columns of X are sparse and share the same support. 

3.1 Sparse Bayesian Formulation 
3.1.1 Noise model 



Under an assumption of white (circular symmetric) complex Gaussian 1 18 1 noises, we have 

T 

p{E\ao) = llCAf{e{t)\0,a^'l) (6) 

t=i 

where oq = cr^^ denotes the noise precision with cr^ being the noise variance, the probability density function 
(PDF) of a (circular symmetric) complex Gaussian distributed random variable u ~ CM (/x, S) with mean and 
covariance 5] is fTS) 

CAr(w|/x,5]) = -J^^exp{-iu-tJ,f^-^{u-fi)Y (7) 

Then we have 

T 

p{Y\X,ao,P) = l[CAf {y{t)\^x{t),ao'l) . (8) 

t=i 

In the case where the noise precision ao is unknown that is assumed in this paper, a Gamma prior is assumed for 
ao since it is a conjugate prior of the Gaussian distribution: 

p {ao; c, d) = T {ao\c, d) (9) 

where T {u\c, d) = [T [c)]'^ ^ exp {— dn} for a random variable u € M and F (•) is the Gamma function. We 
set c, d — )• as in ll8}[T9| to make the Gamma prior noninformative. 

3.1.2 Sparse signal model 

A sparse prior is needed for the jointly sparse matrix X of interest. We assume that signals among snapshots are 
independent and adopt the following two-stage hierarchical prior: 

p{X-p) = jp{X\a)p {a; p) da (10) 
where p > 0, q G M^, A = diag (a) and 

T 

p{X\<x) = llCAr{x{t)\0,A), (11) 

t=l 

N 

p{a;p) = l[T{an\l,p). (12) 

n=l 
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It is easy to show that all columns of X are independent and share the same prior. According to fTU\, for t = 
1, • • • , T both K {x{t)} and 9 {x{t)} are Laplace distributed and share the same PDF that is strongly peaked at the 



origin. As a result, the two-stage hierarchical prior defined by ( [TO] ) is a sparse prior that favors most rows of X being 
zeros. 

3.1.3 Off-grid distance model 

By 13 £ [—^f, ^f] ^> we assume that /3 is uniformly distributed to make the prior noninformative: 



1 1 

2 ' 2 



(13) 



By combining the stages of the hierarchical Bayesian model, the joint PDF is 

p{X,Y,ao,a,P) = p (Y\X , oq, (3) p {X\a) p (a) p (ao) p (p) (14) 
with the distributions on the right hand side as defined by ([8]), ( [TT] ), ([12]), (]9]) and ( [T3[ ) respectively. 

3.2 Bayesian Inference 

An evidence procedure f20'| is exploited to perform the Bayesian inference since the exact posterior distribution 
p {X , ao, a, P\Y) cannot be explicitly calculated. Similar approaches have been used in standard Bayesian CS 
methods, e.g., pO][T9| . First it is easy to show that the posterior distribution of X is a complex Gaussian distribution: 

T 

p{X\Y,ao,a,p) = l[m{x{t)\fi{t),^) (15) 
t=i 

with 

= aoS*^y(t), t = l,---,r, (16) 
S = (ao*^* + A"^)"^ • (IV) 

Calculations of S and t = 1, ■ ■ ■ ,T, need estimates of the hyperparameters ao, a and /3. In an evidence 
procedure, they are estimated using an MAP estimate that maximizes p {oq, a, P\Y). It can be easily observed 
that to maximize p (ao, ct, P\Y) is equivalent to maximizing the joint PDFp {Y, uq, a, /3) = p (qq, a, (3\Y) p (Y) 
since p [Y) is independent of the hyperparameters. An expectation-maximization (EM) algorithm is implemented 
that treats X as a hidden variable and turns to maximizing E {logp {X, Y, ao, a, fS)}, where p {X, Y, oq, a, (3) 



is given in ( 14) and E {■} denotes an expectation with respect to the posterior of X as given in ( 15 ) using the current 
estimates of the hyperparameters. 

Denote U = [/x(l),--- , n{T)] = ao'S^^Y, X = X/VT, Y = Y/VT, U = U/Vf and p = p/T. 
Following a similar procedure as in fSl, it is easy to obtain the following updates of cx and ao- 



1 + 4££;|||X"||^| - 1 

= ^ ^ , n = l,---,A^, (18) 

ip 



"O = r ^^i ' (19) 



E \ \\^ - ^X\\l\ + d/T 

where = \\U^l + ^nn, E [\\Y- ^X\\l^ = HZ - + a^ ^ ^^^i 7„ with 7„ = l-a'^E 
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For (3, its estimate maximizes E {logp {Y\X, ao,f3)p (/3)} by ( 14 1 and thus minimizes 
Ei^^Y.\\y{t)-{A + BA)x{t)\\l^ 

- ^ Mt) - (A + BA) f^ml + Tr{{A + BA) S ( A + BAf] 



(20) 

N ■ \\ii(t) - (A-\- RA) „.(t)\\t -\-TrUA^ RA) A -\- RA)^ \. 

T 

t=i 

=/3^P/3 - 2v^/3 + C 
where C is a constant term independent of /3, P is a positive semi-definite matrix and 

P = K i^B^B (W • + S) } , (21) 

T 

T 

-K {diag [B^A-E] } . (22) 



i5]diag(/.(t))p^ (y(O-AMt)) 



The detailed derivation of ( 20 1 is provided in Appendix for simplicity of exposition. As a result, we have 



/3"<^"' = arg mill {p^PP-2v^p} . (23) 



/3e|-|r,|r| 



Remark 1 Though an explicit expression of f3'^^'^ cannot be given, by recognizing that (3 is jointly sparse with x, 
the dimension of (3 can be reduced to K in the computation and hence /3"^"' can be efficiently calculated. We provide 
the details in Subsection\3.5\ 



The proposed OGSBI algorithm is implemented as follows. After initializations of the hyperparameters a, 



and (3, we calculate Xl and t = 1, ■ ■ ■ ,T, using the current values of the hyperparameters according to ( 17 1 



and ( 16 1 respectively. Then we update a, ao and /3 according to ( 18 1, ( 19 1 and (23 1 respectively. The process is 
repeated until some convergence criterion is satisfied. We note that OGSBI has guaranteed convergence since the 
function p (ao, a, (3\Y) is guaranteed to increase at each iteration by the property of EM algorithm 1 ^21] . 

3.3 OGSBI-SVD 

In this subsection we introduce a subspace-based idea used in [5] that uses the SVD of the measurement matrix 
Y = USV^ to reduce the computation of the signal reconstruction process and the sensitivity to the measurement 
noise. Then we incorporate it into our OGSBI algorithm. Consider the noise-free case where Y = with K <T. 
We have Rank (Y) < Rank (X) < K. Let V = [Vi V2], where Vi and V2 are matrices that consist of the first 
K and the rest T — K columns of V respectively. Then we have that Y sv = YVi G C^^^^ preserves all signal 
information. In a general case where noises exist, by the SVD we have YV = \Y sv YV2], i.e., the observation 
matrix is decomposed into a signal subspace and a noise subspace denoted by Y sv and YV2 respectively. The 
signal subspace preserves most signal information and is to be used in the following signal recovery process while 
the noise subspace is abandoned. Denote Xsv = XVi and Esv = EVi. Then we have 



sv 



^Xsv + Esv (24) 



In (24 1, Y SV', ^sv and Esv can be viewed as the new matrices of sensor measurements, source signals and 



measurement noises respectively. The joint sparsity still holds in Xsv- We neglect possible correlations that exist 
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between columns of Xsv (and in Esv)^ we still assume that Xsv (and Esv) have independent columns]^ It 
is then straightforward to apply the proposed OGSBI algorithm to estimate X^y, /3 and then the DOAs. We use 
OGSBI-SVD to refer to the resulting algorithm. 

Based on implementation details to be introduced in Subsection 3.5 it can be shown that OGSBI-SVD has a 
computational complexity of order O [MN"^) per iteration while that for OGSBI is O (max [MN'^, MNT)) per 
iteration. An additional computational workload of order O (max (M^T, MT^)) is for the SVD of Y in OGSBI- 
SVD. Since it is empirically found that OGSBI-SVD converges much faster than OGSBI, the whole computational 
workload of OGSBI-SVD is less than that of OGSBI in general]^ 



3.4 Source Power and DOA Estimation 



We use the estimated source powers from different directions to form a spectrum of the proposed algorithm. In the 
following we derive a formula to estimate the source powers. We take OGSBI-SVD as an example. The case of 
OGSBI is similar with some modifications. Let X = XsvV^ be an estimate of the signal X. Then consider X 
row by row and we have X ~ CM (lA Vi , Y,nnViVi^ where we use VI and X) to denote the final estimates of 
the mean and co variance of Xsv respectively. We use the expectation as an estimate of the power p„ from direction 
6n (with a modification of /3„): 



E{pn} 



1 



-E 



1 

f 
1 

f 



E 



{-"} 



X 



+ E 



X 



E 



{-"} 



(25) 



u 



+ 



T 



Like other spectral-based methods, the DOAs are estimated using the locations of the highest peaks of the 
spectrum. Suppose that the grid indices of the highest K peaks of p are n^, /c = 1, • • • ,K. The estimated K DOAs 
will be Ok = h, + Pn,,k = !,■■■ ,K. 



3.5 Implementation Details 

This subsection presents some details of our implementations of OGSBI and OGSBI-SVD. At each iteration of 
OGSBI or OGSBI-SVD, an x matrix inversion is required when updating 51 according to ( [TT] ). By Af < 
the Woodbury matrix identity is applied to give 

S = A-A*^C"^*A (26) 

with C = Uq^I + *A$^ e C^^^*^. 

By the fact that (3 is jointly sparse with x{t) whose K nonzero entries correspond to the locations of the K 
sources, we calculate only entries of /3 that correspond to locations of the maximum K entries of a and set others to 
zeros. As a result, (3, P and v can be truncated into dimension of K or K x K. We still use /3, P and v hereafter to 

^It is empirically found that the resulting algorithm based on such an assumption performs well in practice, as reported in Section|4] The 
correlations between columns of the signal matrix (Xsv in our case) have recently been studied in jl2| . 

possible exception happens in the case ofT^N where the computation for the SVD is quite heavy. A modified approach in such 
a case is to partition Y firstly into blocks with each of about columns, then operate the SVD on each block and keep the resulting signal 
subspaces, and finally do another SVD on the new signal matrix composed of all signal subspaces. A model similar to i24\ can be cast. 
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denote their truncated versions for simplicity. By ( 23 1 and ^ { 0^Pf3 - 2v'^(3] = 2 {P(3 - v) we have = /3 
if P is invertible and ^ = P^v G [— 5 



2^^' 2^. 



8/3 

Otherwise, we update (3 elementwise, i.e., at each step we update 



one /3n by fixing up the other entries of (3. For n = 1, ■ ■ ■ , N, first we let 



Pr, 



(27) 



where is u without the nth entry for a vector u. Then by constraining /3„ G 5^] we have 



k 2^ 



r, if /3„ < 
otherwise. 



-2r; 



(28) 



We terminate OGSBI and OGSBI-SVD if 



< r or the maximum number of iterations is reached, 



where r is a user-defined tolerance and the superscript i refers to the iteration. 



4 Simulation 

In this section, we present our numerical results for the DOA estimation. A standard uniform linear array (ULA) of 
M = 8 sensors is considered. The origin is set at the middle point of the ULA to reduce the approximation error in 
(2 1. So we have Amn = exp |j7r (m — ^^y^) cos6'„| and Bmn = —Jt^ {'m — ^^^y^) sin6'„ • Amn, m = 1, - ■ ■ , M, 

n = 1, • • • ,N, with j = \/^. A uniform sampling grid {0°, r, 2r, • • • , 180°} is considered with r being the grid 
interval. The number of snapshots is set to T = 200 in the case of MMV. We consider only OGSBI-SVD in the case 
of MMV since it is empirically observed to converge faster and be more accurate in comparison with the performance 
of OGSBI. In OGSBI-SVD, we set p = 0.01 and c = d = 1 x 10~^. We initialize = r-r, 

Et=i yar{(Ysv)t\ 

cx = jjj^ '}2d=i \ sv)t\ ^i^^ /3 = 0, where |-| applies elementwise. We set r = 10^'^ and the maximum 

number of iterations to 1000. We note that the proposed algorithm is insensitive to the initializations oiaQ,cx and (3, 
as well as to pii p is not too large. All experiments are carried out in Matlab vJJ.O on the same PC with a Windows 
XP system and a 3GHz CPU. 



4. 1 Comparison with i 1 -S VD 

We take £i-SVD in Q as a representative of on-grid model based methods and compare OGSBI-SVD with it in 
terms of mean squared error (MSB) and computational time. We firstly study the MSEs of OGSBI-SVD and ii- 
SVD with respect to the grid interval r and SNR. In our experiment, we consider SNR = lOdB and OdB, and 
r = 0.5°, 1°, 2° and 4°. In each trial, K = 2 sources 61, 62 are uniformly generated within intervals [58°, 62°] and 
[86°, 90°] respectively. For each combination (SNR, r), the MSB is averaged over R = 200 trials: 

MSE=^EX;(^^-^^)' (29) 

1=1 k=l 

where the superscript i refers to the ith trial. It should be noted that there exists a lower bound for the MSB of 
^i-SVD regardless of the SNR since the best DOA estimate that £i-SVD can obtain is the grid point nearest to 
the true DOA. In fact, the lower bound is shared among all on-grid model based methods. By assuming that the 
true DOA is uniformly distributed, the lower bound can be easily calculated as LB = r^/12|^ Fig. [l]presents our 

''The presented lower bound is, in fact, tiie expectation in the case of limited trials. The variance approaches zero as the number of trials 
gets large. 
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■e — Lower bound for L1-SVD 




1 1 , , , , , , , 1 

0.5 1 1.5 2 2.5 3 3.5 4 

Grid interval (degrees) 



Figure 1: MSEs of OGSBI-SVD and h-SYD. The lower bound is for ^i-SVD regardless of the SNR. 
Table 1: Averaged Time Consumptions of £i-SVD and OGSBI-SVD With Respect to SNR and r. Time unit: sec. 
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OGSBI-SVD 
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r = 0.5° 


r = 1° 


r = 2° 


r = 4° 


£i-SVD 


0.413 


0.295 


0.218 


0.190 


OGSBI-SVD 


10.9 


0.831 


0.104 


0.024 



experimental results. In all scenarios under consideration, OGSBI-SVD has more accurate DOA estimation than 
^i-SVD. Moreover, OGSBI-SVD can exceed the lower bound for ^i-SVD in most scenarios. The phenomenon is 
significant in the case of a higher SNR or a coarser sampling grid where the on-grid model has a poor performance 
on describing the true observation model while it is overcome to a large extent by the off-grid model used in this 
paper. 

Table [T] presents the averaged CPU times of OGSBI-SVD and £i-SVD (excluding the SVD process that takes 
about 0.003s in our case) with respect to SNR and rj^For both OGSBI-SVD and ^i-SVD, their CPU times decrease 
as the grid gets coarser. OGSBI-SVD is faster than ^i-SVD at r = 2° and 4°. One drawback of the proposed method 
is that it is slow in the case of a dense sampling grid. In practice, we recommend to use a coarser grid with r = 2° 
for the proposed algorithm since it can give an accurate yet fast DOA estimation. 

Finally, we emphasize that the advantage of the proposed method on DOA estimation accuracy is not limited to 
the comparison with ^i-SVD. Indeed, the MSE of OGSBI-SVD can exceed the lower bound that is shared among 



all on-grid model based methods including CS-MUSIC, SA-MUSIC and the algorithm in 1 12 1. 



Remark 2 Another advantage of the proposed algorithm is its smaller DOA estimation bias in comparison with that 
ofii-SVD, especially in the case of closely spaced sources. Due to the page limit, we do not present our simulation 
results. 



^The code of ^i-SVD is provided by the author of jsj. We note that its speed can be accelerated using state-of-the-art algorithms for CS. 
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Table 2: Averaged MSEs and CPU Times of STLS and OGSBI in the SMV Case With Respect to r 





MSE (dB) 


Time (sec) 




r = 2° 


r = 4° 


r = 2° 


r = 4° 


STLS 


-36.5 


-36.6 


5.31 


1.77 


OGSBI 


-45.1 


-43.3 


0.098 


0.028 



Table 3: Averaged MSEs of OGSBI-SVD in the Presence of Outliers, k denotes a ratio by which outliers are 
augmented. 



n 


1 


5 


10 


20 


50 


100 


MSE(dB) 


-63.1 


-62.9 


-47.6 


-38.6 


-34.6 


-32.0 



4.2 Comparison with STLS 

The off-grid model has recently been used in fTF) for DOA estimation. In fT6l, a sparse total least-squares (STLS) 
approach is proposed. In the SMV case, STLS seeks to solve the nonconvex optimization problem 



mm ■ 
x,l3 



[ml + \\y-{A + B/:^)x\\l + \\\x\\^] 



(30) 



where x is the sparse source signal of interest, y is the noisy measurement. A, B, A and (3 are the same as defined 
in the off-grid model, and A > is a regularization parameter. From the Bayesian perspective, this is equivalent to 
seeking for an MAP solution of (a;, (H) by assuming that the measurement noise is white Gaussian, x is Laplacian 
and /3 is Gaussian. It is noted that the last assumption for /3 cannot properly capture the property of (3. A local 



minima of the problem in (30 1 is achieved in fl^ by an alternating approach, i.e., alternatively solving x with a 
fixed (3, which requires a solution to an £i -regularized least square problem, and solving j3 with a fixed x, which 
requires a solution to an N dimensional linear system. As argued in |[5| the SVD used in OGSBI-SVD can alleviate 
the sensitivity to the measurement noise in the MMV case that is not used in [16]. To make a fair comparison, we 
consider only the SMV case when comparing our method with STLS though a similar problem can be cast for STLS 
in the MMV case. In our implementation of OGSBI, we initialize ao 
settings are the same as those for OGSBI-SVD in the MMV case. 

In our experiment, we consider two DOAs from 63.2° and 90.3° with SNR = 20dB. We consider r = 2° and 4° 



100 
VaT{y} 



and OL 



1 

MK 



A^y\. The rest 



for both OGSBI and STLS. The parameter A in (30 1 is tuned to our best such that STLS achieves the smallest error. 
Table |2]presents the averaged MSEs and CPU times of STLS and OGSBI over R = 200 trials. OGSBI obtains more 
accurate DOA estimations than STLS in both the scenarios with remarkably less computational times. We also note 
that it is possible to accelerate STLS using state-of-the-art algorithms for CS. 



4.3 Sensitivity to Measurement Outliers 

The SVD procedure in OGSBI-SVD is related to the principal component analysis (PCA). As is known that the 
standard PCA is sensitive to outliers. Even a single corrupted measurement can deteriorate the quality of the ap- 
proximation. In this subsection we carry out experiments to check whether the proposed OGSBI-SVD is sensitive to 
measurement outliers due to the SVD. The experimental setup is similar to that in Subsection 4. 1 but with SNR = oo. 
After acquiring the noiseless measurements, we randomly choose 3 out of the MT = 1600 measurements, multiply 
by a constant ratio k and then save as the outliers. Beside the case of no outliers (ratio = 1) we consider five other 
cases where n is set to 5, 10, 20, 50 and 100 respectively. Table [3] presents our simulation results of the MSEs as 
defined in ( 29 1. It can be seen that the estimation accuracy of OGSBI-SVD can degrade significantly even with about 
0.2% measurements being corrupted due to the sensitivity of the SVD. 
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Note that the corrupted measurement matrix Y due to the outliers is a sum of a low-rank matrix (noiseless 
measurement matrix of rank K) and a sparse matrix (outliers). A robust PCA technique has recently been proposed 



in 1 22 1 that can recover the original low -rank matrix from the sparse outliers. So, it is possible to combine the robust 



PCA technique in |22| with the proposed OGSBI-SVD to improve its robustness to outliers. But due to the page 



limit we do not discuss the problem further in this paper. 

5 Conclusion and Future Work 

In this paper, we studied the off-grid DOA estimation model firstly proposed in p6| for reducing the modeling 
error due to discretization of a continuous range. We proposed an algorithm based on the off-grid model from a 
Bayesian perspective that is applicable to both single snapshot and multi-snapshot cases. A subspace-based idea 
was introduced to reduce the computational complexity of the signal recovery process and the sensitivity to noise. 
We illustrated by simulations that the proposed approach outperforms standard CS methods whose performance is 
limited by the underlying standard on-grid model. It is also more accurate than the algorithm proposed in |fT65 based 
on the off-grid model. 

One drawback of the proposed algorithm is its slow speed in the case of a dense sampling grid though a coarser 
grid can be adopted to obtain an accurate yet fast DOA estimation. Motivated by the fast alternatives developed 
for conventional on-grid model based algorithms, e.g., in flOl, it is a future work to develop fast versions of our 
algorithm. Like existing methods in Bayesian CS, the proposed algorithm is based on heuristics to some extent. Its 
estimation accuracy deserves further studies in the future. 



Appendix: Derivation of (20) 



Eq. (20 1 is based on the following two equalities: 



\y-{A + BA)fi\\l 
= ||(y-A/x)-S.diag(/x)./3||^ 
= /3^diag^ (/i) S^S-diag(/x)/3 
- 23? {(y - Afif B ■ diag ■ (s] + Ci 



= 0^ [b^B (3-2R {diag (7Z) B^ {y - A^i)f (3 + Ci, 

Tr I (A + BA) S (A + S A)^} 

= 2rr{5}? {AEAS^}} +Tr {BAEAS^} +C2 
= 2K{rr {S^ASA}} +rr{ASAB^B} +C72 



= 23f? {diag (B^ AS) } p + p'^ (^^QB^Bjp + C2 

where Ci, C2 are constants independent of (3, and the equality 

Tr {diag^ {u) Q ■ diag {v) ■ R^] = u" {Q Q R) v 

for vectors u, v and matrices Q, R with proper dimensions is used. Note that 0^ S(3 G M for a positive semi-definite 
matrix S with proper dimension and thus 



since /3 is real- valued. Then (20 1 is obtained by observing that both B^ B ^t/x^ and S B^ B are positive 
semi-definite. 
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